A Principled Approach to Bridging the Gap between Graph Data and their Schemas
نویسندگان
چکیده
Although RDF graph data often come with an associated schema, recent studies have proven that real RDF data rarely conform to their perceived schemas. Since a number of data management decisions, including storage layouts, indexing, and efficient query processing, use schemas to guide the decision making, it is imperative to have an accurate description of the structuredness of the data at hand (how well the data conform to the schema). In this paper, we have approached the study of the structuredness of an RDF graph in a principled way: we propose a framework for specifying structuredness functions, which gauge the degree to which an RDF graph conforms to a schema. In particular, we first define a formal language for specifying structuredness functions with expressions we call rules. This language allows a user to state a rule to which an RDF graph may fully or partially conform. Then we consider the issue of discovering a refinement of a sort (type) by partitioning the dataset into subsets whose structuredness is over a specified threshold. In particular, we prove that the natural decision problem associated to this refinement problem is NP-complete, and we provide a natural translation of this problem into Integer Linear Programming (ILP). Finally, we test this ILP solution with three real world datasets and three different and intuitive rules, which gauge the structuredness in different ways. We show that the rules give meaningful refinements of the datasets, showing that our language can be a powerful tool for understanding the structure of RDF data, and we show that the ILP solution is practical for a large fraction of existing data.
منابع مشابه
نظریه پردازی بر فرآیند انتقال دانش نظری به حوزه عمل در پرستاری: رویکرد گراندد تئوری
Introduction & Objective: Knowledge transfer and in fact, the bridging of theory and practice is one of the main concerns of all academic disciplines. Getting prominent professional status is the thing that can be achieved by knowledge-based function, and of which would be called as successful discipline that it be able to transfer its theoretical paradigmatic claims into practice. Accordingly,...
متن کاملCross border E-Science and Research Partnership: Bridging the Gap Between Science and Media
E-Science is a tool that helps scientists to store, interpret, analyze and make a network of their data, and it can play a critical role in different aspects of the scientific goals and research. This commentary, under the topic of Cross Border E-Science and Research Partnership: Bridging the Gap between Science and Media,[1] attempts to shed light on E-Science with emphasis on three importa...
متن کاملBridging the Gap Between Research and Policy and Practice; Comment on “CIHR Health System Impact Fellows: Reflections on ‘Driving Change’ Within the Health System”
Far too often, there is a gap between research and policy and practice. Too much research is undertaken with little relevance to real life problems or its reported in ways that are obscure and impenetrable. At the same time, many policies are developed and implemented but are untouched by, or even contrary to evidence. An accompanying paper describes an innovative progr...
متن کاملCauses of the Gap between Junior High School Intended, Implemented, and Attained Curricula and Ways of Bridging It
Causes of the Gap between Junior High School Intended, Implemented, and Attained Curricula and Ways of Bridging It M.A. Jamaalifar* S. Sh. HaashemiMoghadam, Ph.D.** Z. Aabedi Karajibaan, Ph.D.*** A.R. Faghihi, Ph.D.**** To identify the causes of the perceived gap between junior high school intended, implemented, and attained curricula, a group of 30 curriculum planners, 50 educationa...
متن کاملBridging the semantic gap for software effort estimation by hierarchical feature selection techniques
Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 7 شماره
صفحات -
تاریخ انتشار 2014